Folding Free Energy Determination of an RNA Three-Way Junction Using Fluctuation Theorems

Nonequilibrium work relations and fluctuation theorems permit us to extract equilibrium information from nonequilibrium measurements. They find application in single-molecule pulling experiments where molecular free energies can be determined from irreversible work measurements by using unidirectional (e.g., Jarzynski’s equality) and bidirectional (e.g., Crooks fluctuation theorem and Bennet’s acceptance ratio (BAR)) methods. However, irreversibility and the finite number of pulls limit their applicability: the higher the dissipation, the larger the number of pulls necessary to estimate ΔG within a few kBT. Here, we revisit pulling experiments on an RNA three-way junction (3WJ) that exhibits significant dissipation and work-distribution long tails upon mechanical unfolding. While bidirectional methods are more predictive, unidirectional methods are strongly biased. We also consider a cyclic protocol that combines the forward and reverse work values to increase the statistics of the measurements. For a fixed total experimental time, faster pulling rates permit us to efficiently sample rare events and reduce the bias, compensating for the increased dissipation. This analysis provides a more stringent test of the fluctuation theorem in the large irreversibility regime.


Introduction
Thermodynamic properties of nucleic acids and proteins are commonly studied in bulk assays; however, since the advent of single-molecule technologies, molecular thermodynamics can be determined with unprecedented detail. Techniques such as fluorescence resonant energy transfer (FRET) and force spectroscopy techniques, such as laser optical tweezers, permit us to monitor reactions one molecule at a time [1,2]. By exerting forces at the ends of a biopolymer, it is possible to monitor unfolding/folding reactions from the recorded changes in extension, allowing us to measure folding free energies [3][4][5] and binding energy of ligands to substrates [6][7][8].
Two different kinds of experiments are usually performed: hopping and pulling experiments. In hopping experiments, the unfolding/folding and binding/unbinding reactions are investigated under equilibrium conditions [9][10][11][12][13][14][15]. The main limitation of these experiments is the height of the kinetic barriers that molecules must cross, which, if too high, do not permit sampling of the different conformations in the experimentally accessible time [16,17]. Instead, in pulling experiments, the force is ramped at a given rate and the unfolding/folding and binding/unbinding reactions are followed under nonequilibrium conditions. In pulling experiments, the barrier height is modulated with time, rendering the folding energy landscape accessible even for molecules with high kinetic barriers, such as long RNAs or proteins [18][19][20][21][22].
In pulling experiments, the optical trap is moved back and forth to mechanically unzip the molecule while a force-trap position curve is recorded. Some molecules exhibit a large hysteresis between the unfolding and folding trajectories, indicative of a high kinetic barrier. A useful approach to estimate the folding free energy from nonequilibrium experiments is the Crooks fluctuation theorem (CFT) [23] and its corollary, the Jarzynski equality (JE) [24], which relate the work done under nonequilibrium conditions with the equilibrium folding free energy.
The CFT is a bidirectional method that uses the unfolding and folding trajectories to derive the folding free energy, ∆G. In contrast, the JE is unidirectional, meaning it only considers the unfolding or folding trajectories to estimate ∆G. Here, we investigate the limitations of the CFT and the JE using an RNA three-way junction (3WJ) as a model system [25]. The 3WJ is an interesting example of an RNA with high kinetic barriers to unfold and fold, which shows significant work dissipation in pulling experiments. Besides the CFT and the JE, ∆G 0 has been estimated using Bennet's acceptance ratio (BAR) to compare the different methods. BAR minimizes the statistical variance of the free-energy estimator from the CFT. We compare the three methods under different conditions to study their performance in determining folding free energies from irreversible work measurements.

Synthesis and Single-Molecule Experiments
The molecular construction has been synthesized following the protocol described in reference [3]. Briefly, the DNA sequence encoding the RNA 3WJ (Merck) is inserted between EcoRI (New England Biolabs, NEB, Ipswich, MA, USA) and HindIII (NEB) restriction sites of pBR322 vector (NEB) and cloned by ultra-competent E. coli cells (Invitrogen). Initially, T7 primers were used for PCR amplification (KOD polymerase, Merck, Township, NJ, USA) of the DNA containing the 3WJ sequence flanked by the handles. Next, RNA is synthesized by in-vitro transcription (T7 megascript, Merck). The transcribed RNA contains the 3WJ sequence flanked by 527 and 599 bases at the 3'-end and 5'-end, respectively. Hybrid DNA-RNA handles are made by hybridizing the RNA strand with PCR-amplified complementary DNA sequences. Handles are labeled with a biotin (5'-end) and a tail of digoxigenins (3'end) that specifically bind to streptavidin-coated (SA) beads and anti-digoxigenin-coated (AD) beads. Schematics of the experimental setup are shown in Figure 1A. The molecular construction is tethered between the SA (2.1 µm Kisker Biotech) and AD beads (3.0-3.4 µm, Kisker Biotech). The SA bead is held by air suction at the tip of a glass micropipette, while the AD bead is captured in the optical trap.
Here, we have carried out non-equilibrium pulling experiments using a Mini-Tweezers optical setup [19] at room temperature (298 K) and monovalent salt conditions (300 mM NaCl) in a 10 mM HEPES buffer containing 1mM EDTA and 0.01% NaN 3 . The optical trap is moved up and down between two selected positions at a given pulling speed ( Figure 1B). At the initial (λ 0 ) and final (λ 1 ) positions, the 3WJ is folded at a low force ( f 0 ) and unfolded at a high force ( f 1 ), respectively. Upon moving the optical trap up (down), unfolding (folding) events are observed as sudden force jumps in the force-distance curves (FDCs) (black and gray trajectories in ( Figure 1B). The work, W, for the unfolding and folding trajectories is defined as the area below the FDCs between λ 0 and λ 1 [26].

Free-Energy Difference Determination: A Reminder
In this section, we summarize the main formulae for determining the free-energy difference between the folded and the unfolded RNA, (∆G FU ), from non-equilibrium pulling experiments. In the presence of a misfolded state, the standard Crooks fluctuation theorem (CFT) [23] and its corollary, the Jarzynski equality (JE) [24], are not directly applicable. Instead, one must use the extended versions of the CFT and the JE (hereafter referred to as ECFT and EJE) that consider the formation of different competing structures besides the native one [21,27,28]. In addition, we have also used the Bennet's acceptance ratio (BAR) method [29] to extract ∆G. The different energy estimates are then compared to predictions computed using Mfold [30] and the Vienna package [31].

The Extended Crooks Fluctuation Theorem (ECFT)
The CFT establishes a symmetry between the work distributions measured in a nonequilibrium process and its time-reverse, conditioned to start in full (Boltzmann-Gibbs) equilibrium at the beginning of each process. In the ECFT, this condition is replaced by the more general partial equilibrium condition; at the beginning of each process, the system is partially equilibrated in a given subset of the conformational space. In the pulling experiments shown in Figure 1, the RNA folds into two distinct conformations (subsets), native (N) and misfolded (M) [32,33]. Therefore, at the beginning of the pulling cycle, the RNA is partially equilibrated in N or M, and the ECFT must be applied. For simplicity, we will consider the N state, but the case for M is identical. Let P → (W) and P ← (−W) denote the partial unfolding and folding work distributions conditioned to start in N at λ 0 and end in U at λ 1 . Hereafter, the subscripts indicate the unfolding (→) and folding (time-reverse, ←) processes. The extended Crooks fluctuation theorem (ECFT) for the trajectories that are in N at λ 0 reads [27]: where β = 1/k B T, with k B being the Boltzmann constant and T the temperature. ∆G NU is the equilibrium free energy difference between N and U. φ → (φ ← ) are the fraction of trajectories that start at N (U) in λ 0 (λ 1 ) and end in U (N) at λ 1 (λ 0 ) during the unfolding (→) and folding (←) processes. Notice that in the standard CFT, φ → = φ ← = 1, because of the full equilibrium condition. For our RNA molecule, φ → = 1 (the RNA is always unfolded at {λ 1 , f 1 }), whereas φ ← < 1 because the molecule can fold into M ( Figure 1B). The work value where both distributions cross, P → (W ) = P ← (−W ), will be denoted as W and is given by is denoted as the ECFT correction. φ ← is estimated by classifying unfolding and folding FDCs in two subsets, N and M, depending on which state the RNA folds into upon refolding. The classification is made on the basis of the FDC pattern ( Figure 1B). Equation (1) also holds for the M-subset of trajectories. The crossing work value in Equation (1) defines the ECFT estimate for ∆G NU .

The Extended Jarzynski Equality (EJE)
A corollary of the CFT is the JE. Its extended version reads, with C = k B T log(φ ← ), and ∆G NU being the two energies estimates obtained by exponentially averaging ( . . . ) over each set of n U(F) unfolding → (folding ←) trajectories, respectively. In practical cases, n U(F) is finite and the exponential average is biased for a finite number of experiments. The bias of the EJE is defined as, where the averages . . . are taken over different sets of n samples. For FDCs presenting high hysteresis ( Figure 1B), the unfolding ∆G NU → and folding ∆G NU ← estimates differ. To overcome this, it is preferable to combine the unfolding and folding work measurements (e.g., using the ECTF Equation (1) and BAR; see next) to estimate ∆G NU .

The Bennet's Acceptance Ratio (BAR)
The Bennett's acceptance ratio (BAR) method minimizes the statistical variance of the free-energy estimator given by the CFT. Bennet demonstrated that the function Φ(W) = (1 + (n U /n F ) exp [β(W − ∆G NU )]) −1 is the one that minimizes the statistical variance of the estimator ∆G [29]. Rearranging Equation (1) and multiplying it by Φ(W), we obtain: Integrating over W give us the expected values of Φ(W) over the work distributions: Taking logarithms and defining u = ∆G NU − C, and the functions z F (u) and z U (u): with W → i , W ← i being the work values measured in the unfolding and folding processes. In Equation (4), n U and n F denote the number of unfolding and folding trajectories, respectively. With these definitions, the ECFT takes the form: The solution to Equation (5), u , defines the BAR free energy estimate, ∆G NU = u + C.

The Matching Method
An alternative and simple method to estimate ∆G NU , useful in those cases where work distributions do not cross, is the so-called matching method. In this method, we determine the value of ∆G NU by imposing continuity between the measured P → (W) and the reconstructed one from P ← (−W) using Equation (1): The value of ∆G NU that best matches the two work distributions in the crossing work region (around W = ∆G NU − C) defines the matching estimate [22]. Below, we will focus on the previous three estimates (ECFT, EJE, BAR), but will use Equation (6) to illustrate the matching method.

Results
We have carried out pulling experiments at two different pulling speeds: 50 nm/s and 200 nm/s. The recorded FDCs show two different patterns ( Figure 1B) corresponding to two different folded structures: native and misfolded. We have determined the folding free-energy of the native state (N), ∆G N , using the three estimators described in Section 2.2 (ECFT, EJE, BAR). In the case of the misfolded state, as shown in Figure 1B, the trajectories for the misfolded state do not present a significant hysteresis, due to its quasi-reversibility in unzipping experiments. Therefore, we can just derive the free energy by taking the linear response formula to the first order, ∆G (1) [34]. For the misfolded state, we found ∆G From the ECFT and BAR, we obtain ∆G Averaging all estimators, we measured ∆G M = 49(1)k B T. All these results agree with the prediction using both Mfold and Vienna packages, which is ∆G 0 M = 49(2)k B T at our salt conditions. Results for the misfolded state can be found in Table A2 and are illustrated in Figure A1.
To determine ∆G NU from the FDCs, we classify pulling trajectories into two sets (N and M), select the N subset, estimate the fraction of trajectories, φ ← , and measure the partial work distributions (P → (W), P ← (−W)). From the work W, we have subtracted the energy contributions coming from the bead displacement, the stretching of the DNA-RNA handles, and the 3WJ single-stranded RNA (ssRNA). To do so, we have followed the methodology described in [22,35]. Briefly, the elastic response of the ssRNA and hybrid DNA-RNA handles is calculated using the inextensible worm-like chain model [36]. The elastic parameters (p for persistence length and d b for the interphosphate distance) used in this study are: p = 0.9 nm and d b = 0.65 nm/base for the ssRNA and p = 40 nm and d b = 0.3 nm/basepair for the hybrid handles [35,[37][38][39][40][41]. The folded hairpin is modeled as a freely-rotating dipole of 2 nm length [37]. The bead contribution has been estimated by solving the equation for λ( f ) by applying the elastic models and using a harmonic optical trap with a cubic non-linear correction reported in previous experiments [35,42].
In Figure 2A,B (left) we show the unfolding (solid lines) and folding (dashed lines) work distributions for all investigated molecules at the two pulling speeds. From Equation (1), we derive ∆G NU from the crossing work value. The crossing point has been estimated by fitting a generic kernel probability density distribution to the experimental histograms. In Figure 2, we have subtracted the crossing work value W from the partial work W. We note that the unfolding work distributions are broader than the folding ones, suggesting that the transition state upon folding is closer to the native state. To characterize the work distributions, we have investigated their Gaussianity. To do so, we have defined a parameter R that relates the dissipated work, W dis = | W − W |, with the variance of the work, σ 2 W : R = σ 2 W /(2k B T W dis ); according to the ECFT, R = 1 indicates perfect Gaussian behavior. For the folding work distributions, we find R = 0.9(0.1) and R = 1(0.1) for the 50 nm/s and 200 nm/s pulling rates, respectively. In contrast, the unfolding work distributions have R = 2.0(0.2) and R = 2.1(0.1) for 50 nm/s and 200 nm/s, respectively. This result indicates that the unfolding and folding work distributions have different behavior; only the folding distributions can be well approximated with a Gaussian function, which is suggestive of quasi-reversible folding. The average work values ( W ), variance (σ), and average dissipated work ( W dis ) for the investigated loading rates are summarized in Table 1. The results for each studied molecule are gathered in Table A1 (Appendix A).  The biases, B n , of the EJE estimator (Equation (3)) for different n values are shown in the right plots of Figure 2A,B. We have investigated data subsets of different sizes, each containing a random sample of work values. As expected, as the size of the subset grows, the bias decreases [43]. The free energy estimations using EJE Equation (2) are tabulated in Table 2. The average work values W → ( W ← ) overestimate (underestimate) W because the average dissipation W dis → = W → − W ( W dis ← = W − W ← ) is always positive.
As mentioned before, crossing work values (W ) allow us to estimate the energy difference, ∆G NU = W + C. In Figure 3A, we tested ECFT Equation (1) using fits to the generic kernel densities by plotting the ratio between the unfolding and folding work distributions in a log-normal representation. As all energy values are presented in k B T units, the slope of Equation (1) is expected to be equal to 1 (dark solid line). The average slope (dashed line) equals 0.9 ± 0.1, in agreement with the expected behavior (slope values for each molecule are shown in Table A1). In addition, we checked the ECFT by applying the matching method (Equation (6)). If the ECFT holds, we expect to see continuity between the measured (solid symbols) and the reconstructed P → (W) using Equation (6) (empty symbols) ( Figure 3B). Although there is continuity, we also appreciate a change in the slope around the crossing point. Moreover, we have estimated ∆G NU using BAR ( Figure 3C). The value of u = ∆G NU − C is defined as the intersection (empty symbols) between the identity line (dark solid line) and the lines defined by Equation (5) (colored lines). As can be seen, the function z F (u) − z U (u) is almost constant in a range of ∼ 4k B T around the crossing point, u . Figure 3C shows the results using BAR for ∆G NU for all molecules and pulling rates. Results are summarized in Table 2.  Finally, we tested the CFT over the pulling-cycle protocol defined by connecting the unfolding and folding processes, as shown in Figure 4. In this protocol, the trajectories are defined as the closed cycle determined by the unfolding and the subsequent folding trajectory. Therefore, the initial and final states are the same, i.e., the native molecule at {λ 0 , f 0 }. By definition, in a cyclic protocol with a single control parameter, the forward and reverse work distributions are identical, i.e., P → (W) = P ← (−W) ≡ P(W) in Equation (1). The work for a single realization of a closed cycle is defined as W = W U + W F , with W {U,F} being the unfolding and folding work measurement values. If we restrict the analysis to native cycles (i.e., trajectories that start and end in N), the free energy difference for the closed cycle equals zero, ∆G NN = 0 and the fractions φ → = φ ← < 1, so C cancels out and the ECFT Equation (1) takes the form: In Figure 4 (bottom panels), we show the work histograms (dark solid boxes) for the cyclic protocol, P(W). Work histograms are shown for the two pulling speeds: 50 nm/s (bottom left), and 200 nm/s (bottom right). These histograms have been calculated considering all the trajectories from all molecules. Notice that the boxes do not reach W = 0, and P(W) and P(−W) do not intersect at W = 0, so we are not able to test the CFT as in Figure 3A. To circumvent this, we have combined the datasets of unfolding (W U ) and folding (W F ) work values. We have built a new dataset for the work cycle (W = W U + W F ) by drawing W U and W F independently from each dataset. While the original histograms in Figure 4 are calculated with N closed trajectories, by mixing each unfolding trajectory with all the folding trajectories, we obtain a total of N 2 observations, which allow us to extend the tails of the histograms by increasing the number of trajectories from 457 (838) to 208,849 (702,244) at 50 (200) nm/s. The new histograms are displayed as error bars (red and green) in Figure 4, bottom. Note that with this procedure, we obtain virtual cycles that, despite never occurring, would be possible but extremely rare. As expected, the obtained distribution reasonably fits the original histogram. In exchange, we have obtained a few values for W < 0 populating the leftmost tail of the histogram for the molecules pulled at 200 nm/s. These values allowed us to construct a P(W)/P(−W) and test Equation (1). The result is shown in the bottom right panel of Figure 4, inset. The dark solid line represents the unity slope. On the other hand, the same procedure over the dataset for the pulling velocity of 50 nm/s did not produce those values. Although a lower speed lowers the dissipation, this comes at the price of reducing the number of pulls for a given experimental time. Our results demonstrate that, for a given experimental time, it is preferably to increase dissipation by increasing the pulling speed and the number of cycles.
In Table 3, we show the main parameters of the work distributions in the cyclic protocol.

Discussion
We have applied the Jarzynski equality (JE) and the Crooks fluctuation theorem (CFT) to determine the folding free energy of a native RNA three-way junction (3WJ), ∆G NU , from pulling experiments. The RNA 3WJ exhibits hysteresis between the unfolding and folding trajectories (see Figure 1B), which translates into a significant dissipation upon mechanical unfolding. The RNA 3WJ shows a misfolded two-hairpin structure [44] (Figure 1), which is predicted with the Vienna package [31]. Here we have focused on deriving the free energy of the native 3WJ only, where the crossing region between the unfolding and refolding work distributions (P → (W), P ← (−W)) is less populated and the determination of the folding free energy is more challenging.
In the presence of a misfolded structure, the standard JE and CFT must be corrected with the term C = k B T log(φ ← ), where φ ← is the fraction of trajectories that fold into the native structure upon refolding. If there is no misfolding, φ ← = 1 and C = 0. We applied three estimators to the experimental data (Section 2.2): the extended Jarzynski equality (EJE), the extended Crooks fluctuation theorem (ECFT), and Bennet's acceptance ratio (BAR). The average dissipated work for the unfolding process is ∼30 k B T, which is almost half of the folding free energy (at zero force) predicted by Mfold [30,45]; ∆G 0 = 67(3) k B T at [NaCl] = 1 M. This value must be corrected for salt dependence [37] to obtain a value of ∆G 0 = 62(3) k B T in our experimental conditions. Combining the different estimators, we obtain ∆G 0 = 61(1) k B T (see Table 2, bold), in agreement with the expected value.
We have applied the EJE to the unfolding and folding trajectories independently. In Figure 2 (right panels), we quantified the bias [46] of the EJE, B, and studied the convergence of the estimator with the number of pulls, following reference [43]. Despite the large average dissipation (∼20-30 k B T for unfolding and ∼13 k B T for refolding), the EJE estimator converges after one hundred pulls. Averaging estimators for unfolding and refolding, we obtain |∆G NU → − ∆G NU ← |∼12 (8) k B T for 50 (200) nm/s (see Table A1). We remark that faster pulling velocities allow us to obtain more pulls before the tether breaks, and a less biased energy estimator.
We have combined unfolding and refolding measurements in the ECFT (Equation (1)). This method yields a better estimate for the folding free energy (see Table 2). However, bidirectional estimators require that P → (W) and P ← (−W) are sampled around the crossing region. A way to test the ECFT is shown in Figure 3, where a log-normal representation of Equation (1) shows a straight line with a slope close to 1 (see Table A1 for individual slopes). Further validation of Equation (1) is obtained with the matching method shown in Figure 3B, where continuity between the measured P → (W) and the reconstructed one from P ← (W) using Equation (6) is imposed on the data to derive ∆G NU .
Bennett's acceptance-ratio (BAR) method provides a third estimator of ∆G NU . In Figure 3C, we plot z F (u) − z U (u) obtained by solving Equation (5). Although the difference is not flat throughout the u-axis, it is reasonably constant, with a range of 10 k B T around u , which give us confidence in the BAR results.
The RNA 3WJ was chosen because, despite its significant dissipation, there is a visible crossing point between unfolding and folding work distributions. This feature makes it an excellent candidate for comparing the different estimators.
Regarding the estimators, bidirectional methods (i.e., the extended Crooks fluctuation theorem (ECFT) and the Bennet acceptance ratio (BAR)) are more predictive because they combine data from unfolding and refolding in a unique formula. In contrast, the extended Jarzynski equality (EJE) uses the unfolding and folding work data separately. In general, when work distributions cross, it is advisable to take the crossing value as the free energy estimator. This includes the situation where dissipation is low (a few k B T, for example, in the case of the misfolded structure). However, in this case, linear response theory provides good estimations for the free energy [34] (see Section 3, Results). If dissipation is large and the crossing work value is inaccessible, rare events must be sampled to reconstruct the leftmost (unfolding) and rightmost (folding) tails of the work distributions. In this case, it is advisable to increase the pulling speed to collect more trajectories for a given total experimental time (to more efficiently sample rare events) and apply the BAR method, which minimizes the statistical variance of the free energy estimator.
It is important to remark that, in the linear and low dissipation regime, work distributions tend to be Gaussian. In this case, the variance σ 2 W and the mean dissipated work ( W dis ) follow linear response theory, so σ 2 W = 2k B T W dis . However, for dissipation comparable to k B T, drift effects, finite measurement bandwidth, and experimental errors set a lower bound to the variance, so σ 2 W > 2k B T W dis . Strictly speaking, Crooks FT does not hold; however, the crossing method and BAR still give good ∆G estimates. In particular, the Jarzynski equality overestimates the weight of the tails, yielding a negative free energy bias. This leads to misleading predictions; for instance, the unidirectional Jarzynski free energy estimator of the unfolding process is lower than the refolding one.
We want to remark that ∆G is an equilibrium quantity that does not depend on the pulling rate. Therefore, by collecting a sufficiently large number of trajectories, all the estimators would give the same ∆G value. However, as shown in Table 1, dissipation increases with the pulling rate, which would imply that more trajectories are necessary to estimate ∆G correctly. However, for a fixed total experimental time, faster pulling rates permit us to sample rare events much better, compensating for the increased dissipation.
Finally, we considered a new method of data analysis in which unfolding and folding trajectories are mixed to obtain full pulling cycles with ∆G = 0. Forward and reverse work distributions are equal and the crossing equals W = 0 by definition. The work histogram reaches W = 0 for the largest pulling speed (Figure 4, 200 nm/s), and Equation (7) is fulfilled (inset). Future studies might address methods to increase the statistics around the crossing region to extract valuable information about the molecular folding energy landscape.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.